/ 


AD-A253  575 


AN  APPRECIATION  OF  KOLMOGOROV’S  1933  PAPER 

BY 

M.  A.  STEPHENS 


TECHNICAL  REPORT  NO.  453 
JUNE  15,  1992 


DTK 

electe 

AUG  04  1992 


PREPARED  UNDER  CONTRACT 
N00014-92-J-1254  (NR-042-267) 

FOR  THE  OFFICE  OF  NAVAL  RESEARCH 


Reproduction  in  whole  or  in  part  is  permitted 
for  any  piirpose  of  the  United  States  Government 

Approved  for  public  release;  distribution  unlimited 


DEPARTMENT  OF  STATISTICS 
STANFORD  UNIVERSITY 
STANFORD,  CALIFORNIA 


AN  APPRECIATION  OF  KOLMOGOROV’S  1933  PAPER 


BY 

M.  A.  STEPHENS 


TECHNICAL  REPORT  NO.  453 
JUNE  15,  1992 


Prepared  under  contract 


DTIC  QUALI'] 'j’ 


N00014-92-J-1254  (NR-042-267) 

For  the  Office  of  Naval  Research 

Herbert  Solomon,  Project  Director 

Reproduction  in  whole  or  in  part  is  permitted 
for  2iny  purpose  of  the  United  States  Government 

Approved  for  public  release;  distribution  unlimited 

DEPARTMENT  OF  STATISTICS 
STANFORD  UNIVERSITY 
STANFORD,  CALIFORNIA 


Accei 

ioo  Fo 

NTIS 

CR.Ai 

DTIC 

TAB 

U;ijr, 

:cu  . ’ 

Jiistificjlion 

By 

Di^t-ibut'or-  / 

A. 

Dist 

M 

AN  APPRECIATION  OF  KOLMOGOROV’S  1933  PAPER 

by  M.A.  Stephens 


Introduction 

In  1933,  A.  N.  Kolmogorov  (1933a)  published  a  short  but  landmark 
paper  in  the  Italian  Actuarial  Journal.  He  formally  defined  the  empirical 
distribution  function  (EDF),  and  then  enquired  how  close  this  would  be 
to  the  true  distribution  F(x)  when  this  is  continuous.  This  leads  naturally 
to  the  definition  of  what  has  come  to  be  known  as  the  Kolmogorov 
statistic  (or  sometimes  the  Kolmogorov-Smirnov  Statistic)  D,  and 
Kolmogorov  not  only  then  demonstrates  that  the  difference  between  the 
EDF  and  F(x)  can  be  made  as  small  as  we  please  as  the  sample  size  n 
becomes  larger,  but  also  gives  a  method  of  calculating  the  distribution  of 
D  at  specified  points,  for  finite  n,  and  uses  this  to  give  the  asymptotic 
distribution  of  D.  The  ideas  in  this  paper  have  formed  a  platform  for  a 
vast  literature,  both  of  interesting  and  important  probability  problems 
and,  also,  concerning  methods  of  using  the  Kolmogorov  statistic  (and 
also  other  statistics)  for  testing  fit  to  a  distribution.  This  literature 
continues  with  great  strength  today,  after  over  50  years,  showing  no 
signs  of  diminishing.  It  is  evident  that  the  ideas  set  in  motion  by 
Kolmogorov  are  of  paramount  importance  in  statistical  analysis,  and 
variations  on  the  probabilistic  problems,  including  modern  methods  of 
treating  them,  continue  to  hold  attention. 

N.  Kolmogorov  -  early  years  and  position  in  1933. 

Andrei  Nikolaevich  Kolmogorov  was  born  on  April  25,  1 903.  His 
father  was  an  agronomist  who  later  died  in  the  aftermath  of  the 
Revolution;  his  mother  died  shortly  after  his  birth  and  he  was  brought  up 
by  his  mother's  sister.  He  was  taught  by  his  aunts  until  he  was  seven 
and  then  went  to  a  gymnasium  in  Moscow,  to  which  he  later  gave  much 
credit  for  his  early  training.  He  was  interested  early  on  in  mathematics, 
but  also  in  biology  and  Russian  history:  he  widened  these  interests 
even  more  in  later  life  to  include,  for  example,  methods  of  education  and 
poetry.  He  entered  Moscow  University  in  1920  to  study  physics  and 
mathematics,  but  continued  his  studies  in  history.  He  was  a  student 
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during,  of  course,  very  difficult  times  in  Russia  and  in  1922,  to  augment 
his  income,  he  became  a  schoolteacher  while  still  a  student,  a  position 
he  held  for  three  years.  Nevertheless,  he  quickly  came  to  the  atterition 
of  the  Professors  at  Moscow  and,  as  quickly,  began  to  produce  original 
results  in  various  areas  of  mathematics  -  especially  in  set  theory  and 
Fourier  Series.  In  1924  he  began  his  lifetime  interest  in  probability 
theory  and,  in  1925,  published  his  first  paper  in  this  field  with  A.  Y. 
Khinchin.  Also  in  1925,  Kolmogorov  graduated  from  Moscow  University 
and  became  a  postgraduate  student,  in  the  next  years  he  published 
fundamental  work  on  laws  of  large  numbers;  he  regarded  such  laws,  the 
study  of  which  began  with  Bernoulli,  as  the  true  beginnings  of 
probability.  By  the  time  he  finished  as  a  postgraduate  (as  in  many 
European  countries  at  the  time,  a  thesis  degree  was  not  deemed 
necessary),  Kolmogorov  had  written  nearly  twenty  mathematical  papers 
and,  in  June  1929,  he  joined  the  Institute  of  Mathematics  and 
Mechanics  at  Moscow  University  as  a  faculty  member.  Two  years  later 
he  became  Professor  and  two  years  more  saw  him  appointed  Director  of 
the  Scientific  and  Research  Institute  of  Mathematics  at  the  University. 
Earlier,  he  had  begun  his  fundamental  vnork  in  measure  theory  applied  to 
probability,  arising  from  his  concern  to  have  a  rigorous  axiomatic 
foundation  for  the  subject.  This  first  appeared  as  a  paper  in  1929  and 
then  in  1933,  the  same  year  as  the  paper  introduced  here,  he  produced 
his  classical  monograph  on  the  Foundations  of  Probability  Theory, 
which  was  to  prove  so  influential  to  the  development  of  this  subject. 
Between  these  two  works  appeared,  in  1933,  "On  methods  of  analysis  in 
Probability  Theory",  in  which  he  exhibited  the  relationships  between  the 
theory  of  probability  and  the  classical  analytic  methods  of  theoretical 
physics.  This  too  was  to  become  a  seminal  work  in  the  theory  of  random 
processes. 

The  paper  considered  here  thus  came  when  Kolmogorov  was 
thirty  years  old,  at  the  height  of  his  mathematical  powers,  already 
recognized  in  the  Soviet  Union,  and  increasingly  becoming  so  outside 
its  borders.  It  is  a  brilliant  combination  of  his  skill  with  classical 
probability  arguments  combined,  as  we  shall  see,  with  his  abilities  in 
mathematical  analysis. 
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For  the  above  summary,  I  am  greatly  indebted  to  the  review  of 
Kolmogorov’s  life  by  Shiryaev  (1989);  a  biography  of  Kolmogorov  is  also 
given  in  Kotz,  Johnson  and  Read  (1989). 

3.  Summary  of  the  paper. 

In  this  section  the  contents  of  the  paper  will  be  outlined  in  more  detail 
than  that  given  earlier;  in  subsequent  sections  we  show  some  of  the 
ways  in  which  this  short  article  led  to  advances  across  the  broad  fields 
of  probability  and  statistics. 

Suppose  a  random  sample  is  given  of  n  values  of  X;  these  are 
ordered  and  labelled  so  that  Xi  <  X2  ^ . .  .  ^  Xn.  In  more  modern 
notation  this  would  be  written  X(i)  <  X(2)  <  . . .  <  X(n),  but  Kolmogorov's 
original  will  be  used  here. 

(a)  The  function  Fn(x),  called  the  empirical  distribution  function  (EDF)  is 
defined  as 


Fn(x)  =  0 

X  <  Xl; 

Fn{x)  =  ^ 

Xk  <  X  <  Xk+1  k=1,2. 

n  - 1 

Fn(x)  =  1 

Xn  <x 

(b)  Kolmogorov  states  that  we  are  "almost  naturally"  led  to  ask  if  Fn(x)  is 
approximately  equal  to  F(x}  when  n  assumes  a  very  large  value,  and 
refers  to  von  Mises’  (1931)  book  which,  only  two  years  earlier,  had 
introduced  another  statistic  to  measure  how  close  Fn(x)  is  to  F(x). 
Kolmogorov  defines 


D  *  supx  I  Fn(x)  -  F(x)  I 

and  points  out  the  importance  of  answering  whether  Pr  (D  <  e)  tends  to  1 
as  n  -4  00 ,  however  small  the  e. 
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(c)  He  answers  the  question  by  proving  the  following  asymptotic  result, 
expressed  as  Theorem  I. 

Let  <1>(X)  =  Pr(D  <  X/Vn);  then  <I>(X),  as  n  «  uniformly  in  X,  tends  to 

k=— oo 


for  any  continuous  distribution  function  F(x).  Some  values  of  0(X)  are 
given  for  various  X;  it  is  pointed  out  that,  for  small  X,  ^)(X)  converges 
slowly,  and  the  first  term  of  the  equivalent  formula 

I -  OQ 

KO.)  =  ^  ^ 

k*1 

then  gives  excellent  results  for  X  <  0.6. 

(d)  The  proof  of  the  Theorem  first  involves  the  probability  integral 
transformation  Y  =  F{X),  showing  that  the  distribution  of  Y  is  F(y)  =  y, 
0  ^  y  ^  1 ,  namely  the  uniform  distribution.  Also,  if  Dy  is  calculated  from 
the  EDF  of  the  Y-values  given  by  Yj  =  F  (Xj ),  i  =  1 ,  2, . . .  n,  then  Dy  will 
equal  D.  Thus  the  result  required  may  be  deduced  assuming  that  the 
original  values  have  a  uniform  distribution  between  0  and  1,  which  we 
shall  write  U(0,1). 

(e)  The  calculations  are  based  on  the  following  argument.  Suppose  lines 
U  (y  »  X  +  d)  and  L  {y=x  -  d)  are  drawn  parallel  to  y=F(x)=x.  For  D  <  d, 
all  the  "corners"  of  Fn(x)  must  lie  between  U  and  L.  Suppose  Pjk  is  the 
probability  that  Ejk  occurs:  Ejk  is  the  event  that  Fn(x)  lies  between  U  and 
L  at  the  values  x  =  ]/n,  for  all  j  ^  k,  while  also,  at  x=k/n,  iFn(k/n)  -(k/n)|  = 
i/n.  Clearly  P(b  <  d)  is  then  Pon-  Kolmogorov  gives  a  formula  for  Pik‘, 
whore  k**k+1,  as  a  linear  combination  of  the  Py  for  j  <  k;  the  coefficients 


in  the  expression  are  conditional  probabilities  Qjj(k)  that  Ejk*  occurs 
given  that  Ejk  has  occurred.  These  linear  equations  can  be  solved  for 
Pik  and,  hence,  for  the  required  Pon- 


For  practical  calculations,  Kolmogorov  defines  new  quantities  Rjk 
as  functions  of  the  Pjk;  these  enable  Rik*  to  be  expressed  as  linear 
combinations  of  Rjk,  similar  to  the  equations  for  Pjk*,  but  with  easier 
coefficients. 

At  this  point  Kolmogorov's  analytic  skills  are  brought  to  bear.  A  Theorem 
II  is  given,  describing  the  behavior  of  a  random  walk  with  steps  Yj  which 
are  integral  multiples  of  a  constant  e. 

Suppose  Sk  =  ^Yj ,  and  let  Sn  =  ie  for  some  i. 

j=1 

Kolmogorov  gives  a  result  for  RTn.  the  probability  that  Sk  always  lies 
between  certain  bounds,  in  terms  of  the  Green's  function  of  classical 
mathematical  physics.  The  theorem  gives  the  solution  to  a  much  more 
general  problem  than  that  discussed  here;  it  is  not  proved  in  detail,  but 
reference  is  made  to  an  existing  note  and  to  one  forthcoming. 
(Kolmogorov,  1933  b).  For  the  particular  problem  concerning  D,  the  Yj 
are  made  to  be  Poisson  variables,  and  RTn  is  shown  to  be  the  same  as 
Rjn  in  paragraph  (e)  above. 

The  steps  e  now  approach  zero,  and  the  random  walk  becomes  "tied 
down"  to  zero  at  the  n-th  step,  thus  becoming  the  Brownian  bridge  of 
modern  notation;  application  of  Theorem  II  with  appropriate  boundaries 
gives  the  asymptotic  result  given  in  Theorem  I. 


ontemoorarv  work  and  the  impact  of  the  oaoer. 


It  seems  fair  to  say  that  Kolmogorov  regarded  his  paper  as  the 
solution  of  an  interesting  problem  in  probability,  following  his  interests  of 
the  time,  rather  than  a  paper  In  statistical  methodology.  Apart  from  the 
casual  remark  that  Fn(x)  should  closely  estimate  F{x)  in  some  sense,  no 
suggestion  is  made  that  Fn(x)  should  be  used  for  testing  that  F(x)  is  the 
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distribution  of  x.  This  was,  nevertheless,  to  become  one  of  the  major 
outgrowths  of  the  article.  Suggestions  that  Fn(x)  should  be  used  for  such 
a  test  were  in  the  air  at  the  time.  Cramer  (1928)  had  proposed  expanding 
F(x)  in  a  type  of  Gram-Charlier  series  and  then  using  as  test  statistics 
integrals  of  the  type 

Ij.  J{A|(x)}‘  'dx  ,  where  Aj(x)  =  Fn(x)  -  Fj(x), 

A 

and  Fj(x,  is  the  expansion  of  F(x)  up  to  the  j-th  term.  The  integral  is  over 
the  support  of  x.  The  term  Aj  (x)  can  be  thought  of  as  the  j-th  component 
of  the  difference  Fn{x)  -  F(x),  and  the  approach  is  reminiscent  of 
Neyman's  work  on  smooth  tests,  which  appeared  a  few  years  later.  In 
1931,  von  Miscs  (1931)  suggested  that  a  test  could  be  based  on  the 
statistic 

0)2  =  nJx(x)[Fn(x)  -  F(x)]^dx 

where  X{x)  is  a  suitably  chosen  weight  function.  Von  Mises  suggested 
that  X(x)  should  be  constant,  chosen  so  that  E(o)2)=  1 ,  and  with  this  X{x) 
von  Mises  gave  a  computing  formula  for  o)2.  The  distribution  of  the 
criterion  will  vary  with  F(x)  under  test  (and  also  of  course  with  A.(x))  even 
when  this  is  completely  specified;  von  Mises  gave  no  distribution  theory, 
but  evaluated  some  variances  of  the  criterion  when  the  true  distribution 
is  uniform  or  normal. 

Several  years  later,  the  Soviet  mathematician  and  statistician 
Smirnov  (1936,  1937)  made  a  significant  change  in  the  definition  of  0)2. 
This  was  to  write 

<o2=  n|MF(x))[Fn(x)  -  F(x)]^dF(x) 

SO  that  the  integral  is  with  respect  to  F(x)  rather  than  to  x.  The  criterion 
now  becomes  based  on  the  values  of  Zj=F(Xj),  which,  as  was  seen 
above  in  paragraph  3(d),  will  be  U(0,1);  it  will  now  be  distribution-free, 
that  is,  not  dependent  on  the  true  F(x).  This  version  of  the  statistic,  with 
^(P(x))  =  1,  has  come  to  be  known  as  W2,  the  Cramer-von  Mises 
statistic. 
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A  notable  achievement  of  Smirnov  \was  to  find  the  asymptotic  distribution 

2 

of  W2,  in  the  form  of  a  sum  of  weighted  x-\  variables. 

Smirnov  (1939  a,b)  was  also  interested  in  Kolmogorov's  work;  he 
extended  it  to  encompass  one-sided  tests  and  also  two-sample  tests. 

Let  D+  =  sup{Fn(x)  -  F(x)}.  and  D-  =  sup.{F(x)  -  Fn(x)}: 

these  will  have  the  same  asymptotic  distribution,  which  was  found  by 
Smirnov: 

lim  P(>/ri  D+  <  X,)  =  1  -  e~2X^ 

n-^oo 


For  two  samples,  suppose  Fn(x)  and  Gm{x)  are  the  EDF's  of  two 
independent  random  samples  of  sizes  n,  m  respectively:  define 
N  =  mn/(m+n),  and  let 

D+n.m  =  sup{Fn(x)  -  Gm{x)},  D-n,m  =  Sup{Gm{x)  -  Fn(x)}  and 
Dm.n  =  sup  IFn(x)  -  Gm{x)|. 

Smirnov  shows  that  the  asymptotic  distribution  of  ^|WDm,n  is  the  same  as 
that  of  VnD  given  in  Kolmogorov’s  Theorem  I. 


Smirnov  (1939a)  also  examined  Vn(X),  the  number  of  crossings  of  Fn(x) 
with  the  lines  F(x)  ±  xV  n,  and  showed  that  as  n  ->  «,  P(Vn(X)  <  W  n) 


converges  to 


0(t.X) 


-  ^  ml  dt^L 


t^’’  exp 


(t+2Xm  +2X)2 
2 


He  also  gave  a  new  proof  of  Kolmogorov’s  Theorem  I,  and  tabulated  the 
asymptotic  distribution  0(X)  in  Smirnov  (1939b):  and  in  Smirnov  (1944) 
he  found  the  distribution  of  VnD+-  The  table  of  0(X)  was  later 
reproduced  in  English  in  Smirnov  (1948).  Statistics  of  the  D+,  D"  and  D 
type  are  often  referred  to  as  Kolmogorov-Smirnov  statistics. 
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5.  The  war  and  afterwards. 

Thus,  over  a  period  of  about  1 0  years,  the  foundations  were  laid 
by  a  number  of  distinguished  mathematicians  of  methods  of  testing  fit  to  a 
distribution  based  on  the  EDF.  To  test  the  null  hypothesis  Ho  that  F(x), 
completely  specified,  is  the  true  distribution  of  X,  the  statistics  above 
may  be  calculated  and  referred  to  the  appropriate  distribution. 

At  this  point  the  war  intervened  and  much  momentum  in  this  field 
was  certainly  lost.  Kolmogorov,  himself,  became  involved  in  war  work 
(he  worked,  for  example,  on  artillery  problems),  which  certainly  brought 
him  into  greater  contact  with  statistical  analysis,  and  may  account  for  an 
increasing  interest  in  statistics  itself.  In  1948  he  edited  and  wrote  a 
preface  to  the  Russian  edition  of  Cramer’s  Mathematical  Methods  of 
Statistics:  he  protested  the  overly  theoretical  basis  of  the  training  of 
Soviet  statisticians,  a  lament  familiar  enough  outside  the  Russian 
borders.  Perhaps,  also,  Kolmogorov  was  impressed  by  Cramer's 
opening,  which  gives  great  credit  to  British  and  American  statisticians 
for  advances  in  statistics,  while  admiring  France  and  Russia  for  their 
excellence  in  probability:  at  any  rate,  in  that  year  he  spoke  at  a  Tashkent 
Conference  on  Mathematical  Statistics,  on  "Basic  problems  of 
Theoretical  Statistics"  and  also  enlightened  the  assembled  statisticians 
on  "The  real  meaning  of  the  Analysis  of  Variance".  This  was  to  be 
followed,  over  the  years,  by  many  more  contributions  to  the  mainstream 
of  statistics,  while,  of  course,  his  other  wide  interests  were  maintained. 
These  came  to  include,  with  the  passing  years,  a  strong  interest  in  the 
teaching  of  both  mathematics  and  statistics. 

In  the  1950’s  there  was  a  surge  of  interest  in  Russia  in  the  Kolmogorov- 
Smirnov  statistics,  particularly  in  the  combinatoric  problems  associated 
with  crossings  and  with  two-sample  statistics.  Gnedenko  and  Korolyuk 
(1951)  found  the  exact  distributions  of  D+n.n  and  of  Dn.n.  to  compare  two 
empirical  distributions  from  independent  samples  both  of  size  n:  later 
Korolyuk  (1955)  found  exact  distribution  theory  when  m  is  an  integral 
multiple  of  n,  m=np.  By  allowing  p  ^  ,  he  deduced  the  exact 

distribution  of  D+,  and  also  the  more  difficult  distribution  of  D  itself. 
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Gnedenko  and  Rvaceva  (1952)  obtained  the  joint  distribution  of  D+n,n. 
and  D'n.n.  and  verified  the  asymptotic  joint  distribution  already  found  by 
Smirnov  in  1939;  further  results  were  given  by  Gnedenko  (1952). 
Gnedenko  and  Mihalevic  (1952a,b)  discussed  the  number  of  crossings, 
when  one  distribution  function  Fn(x)  crosses  the  other  Gm(x).  The 
interest  spread  to  Hungary  and  across  Asia  to  China;  Renyi  (1953) 
proposed  several  variations  of  Kolmogorov’s  statistic,  such  as  supx 
j{f^n(x)  -  F(x)}/F(x)j;  Chang  Li-Chien  (1955)  examined  the  ratio  of 

Fn(x)/F(x),  closely  related  to  Renyi’s  statistics,  and  Cheng  Ping  (1958) 
gave  further  results  on  crossings. 

Meantime,  in  the  western  world  also,  EOF  statistics  were 
attracting  attention.  An  elegant  paper  by  Feller  (1948)  appeared,  giving 
more  accessible  proofs  of  the  results  of  both  Kolmogorov  and  Smirnov: 
where  Kolmogorov  used  Green's  function  to  find  the  asymptotics  from  the 
equations  for  P(D  <  c/n).  Feller  introduced  generating  functions  for  the 
component  probabilities  and  then  examined  their  limiting  forms.  He  also 
gave  a  theorem  on  the  asymptotic  expectation  of  the  number  of 
crossings  Vn(X)  of  Fn(x)  with  the  boundaries  F(x)  ±  xVn.  At  about  the 
same  time,  there  were  significant  advances  in  methodology.  Doob,  in 
1949,  suggested  that  the  asymptotic  behaviour  of  EOF  statistics  based 
on  en(x)  =  Fn(x)  -F(x)  could  be  found  by  examining  the  limiting 
behaviour  of  en(x),  a  Gaussian  process,  and  calculating  the  statistics 
from  this  limiting  process.  According  to  Khmaladze  (1986),  in  an  article 
presenting  the  1933  paper  in  Kolmogorov's  collected  works, 
Kolmogorov  himself  put  forward  similar  ideas  in  a  Moscow  seminar 
towards  the  end  of  1948,  and  Smirnov  (1949)  wrote  a  brief  paper  on  the 
asymptotics  of  the  Cramer-von  Mises  statistic.  These  ideas,  those  of 
Doob  made  rigorous  by  Donsker  (1952),  laid  the  foundation  for  a  great 
deal  of  later  work  on  the  asymptotics  of  EDF  statistics.  Anderson  and 
Darling  (1952)  used  them  to  examine  such  statistics  and  introduced  the 
statistic  A2,  for  which  the  weight  function  in  Smirnov's  version  of  co^  is 
1/[F(x){1-F(x)}].  This  compensates  for  the  fact  that  en(x)  must 
necessarily  become  small  in  the  tails,  by  essentially  dividing  by  the 
variance  of  en(x),  and  gives  due  weight  to  tail  observations. 
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These  developments  demonstrated  elegant  techniques  of 
combinatorics  and  analysis  in  the  field  of  probability  but.  apart  from  some 
asymptotic  tables,  the  practical  statistician  was  largely  neglected. 
However,  in  the  1950's,  other  authors  were  filling  the  gap.  Massey 
(1950,  1951a),  and  Birnbaum  and  Tingey  (1951),  using  new  formulas 
and  difference  equations,  gave  tables  of  percentage  points  and  of 
probabilities  for  finite  sample  size  n,  for  D  and  D+:  these  were  later 
augmented  by  Miller  (1956).  Birnbaum  (1952),  using  the  original 
techniques  of  Kolmogorov  himself,  gave  complete  tables  of  the 
distribution  of  D.  and  a  table  of  percentage  points  for  n  up  to  100.  Thus, 
at  last  -  nearly  twenty  years  after  the  statistic  was  suggested!  -  practical 
formulas  and  tables  were  available  to  make  D  available  to  test  that  F(x) 
is  a  completely  specified  continuous  distribution. 

Many  years  later  again,  Stephens  (1970)  used  these  tables  to 
derive  a  modification  of  D.  This  is  a  simple  expression  in  D  and  n  which 
gives  D*,  this  is  to  be  compared,  for  testing  purposes,  with  the 
asymptotic  points  for  VnD  given  by  Kolmogorov's  Theorem  I.  The  test  is 
thus  made  easy  to  use  without  extensive  tables  of  points  for  every  n. 
Stephens  (1970)  also  found  similar  modifications  for  and  D“,  for  V  = 
D+  +  D“  (see  Section  7  below)  and  for  the  Cramer-von  Mises  W2. 

For  two  samples,  Massey  (1951b)  and  Drion  (1952)  gave  tables 
for  Dn,n  and  Massey  (1952)  for  Dm,n.  mostly  for  n=mp,  where  p  is  an 
integer;  practical  formulas  for  the  calculation  of  these  statistics  also 
began  to  appear  in  the  literature.  It  was  also  pointed  out  (Wald  and 
Wolfowitz  (1939),  Massey  (1950),  Birnbaum  and  Tingey  (1951))  that  D 
can  be  used  to  give  a  confidence  interval  for  F(x),  and  D+  a  one-sided 
interval. 

At  this  point  all  attempt  will  be  abandoned  to  survey  exhaustively 
the  enormous  literature  which  has  grown  up  on  Kolmogorov-Smirnov 
statistics  and  on  other  EOF  statistics;  many  more  properties  of  D,  D+ 
and  D"  have  been  discovered,  new  methods  of  computing  distributions 
have  been  proposed,  and  variants  of  the  basic  statistics  have  been 


suggested.  Durbin  (1973)  provides  a  comprehensive  and  unifying 
account  of  developments  up  to  that  time,  with  many  references: 
Niederhausen  (1981b)  also  has  references  and  brings  together  many  of 
the  computational  procedures.  A  survey  of  goodness-of-fit  tests  is  in 
Kendall  and  Stuart  (1979,  Vot.  2,  Chap.  30)  and  another  was  given  by 
Sahler  (1968). 

The_Droblem  of  unknown  parameters. 

Despite  the  interest  of  mathematical  statisticians,  and  the 
availability  of  tables,  it  has  taken  many  years  for  the  Kolmogorov- 
Smimov  statistics,  and  other  EDF  statistics,  to  become  part  of  the  regular 
arsenal  of  applied  statisticians.  No  doubt  this  is  because  major  new 
problems  are  presented  if  tests  are  to  be  made  on  F(x),  which  we  now 
call  F(x;6),  when  F(x:0)  is  a  continuous  distribution  containing 
parameters  which  are  components  of  the  vector  6,  and  when  one  or 
more  of  these  components  must  be  estimated  from  the  given  data  set. 
For  the  well-established  Pearson  X2  test,  provided  the  estimation  of 
parameters  is  done  correctly  -  but  how  often  it  is  not!  -  the  asymptotic 
distribution  on  Hq  merely  changes  its  degrees  of  freedom,  but  for  D+,  D~ 
and  D,  (and  for  other  EDF  statistics)  the  distribution  theory  will  depend 
on  the  particular  F(x;0)  being  tested.  This  is  so  even  when  the  unknown 
components  of  0  are  estimated  by  maximum  likelihood  or  another 
efficient  method:  the  distributions,  even  asymptotic,  are  now 
stochastically  much  smaller  than  for  the  case  when  F(x:0)  is  completely 
known.  For  Kolmogorov-Smirnov  statistics,  they  depend  asymptotically 
on  the  distribution  of  the  maximum  of  a  Gaussian  process  with  mean 
zero,  tied  down  at  0  and  1 :  even  though  the  covariance  can  be  found, 
this  distribution  remains  unknown  and  the  early  techniques  of 
Kolmogorov  will  not  find  it.  The  discovery  of  the  asymptotics  of  D+,  D~ 
and  D,  when  parameters  must  be  estimated,  thus  remains  a  major 
theoretical  problem  in  the  area  of  Kolmogorov-Smirnov  statistics. 

If  the  unknown  components  of  0  are  only  location  or  scale 
parameters,  however,  the  distribution  theory  of  all  EDF  statistics,  even 
for  finite  n,  will  depend  only  on  the  family  tested,  and  not  on  the  true 
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values  of  these  parameters,  a  fact  early  recognized  by  David  and 
Johnson  (1948).  In  these  circumstances,  Durbin  (1973,  1975)  has 
shown  how  exact  distributions  of  D+  and  D  can  be  calculated  for  the 
exponential  distribution  F(x;0)  =  1  -  exp(-x/0),  x  <  0,  with  unknown  scale 
0,  and  has  provided  points  for  test  purposes:  for  other  distributions, 
including  the  normal,  extreme-value,  Weibull,  and  logistic  distributions, 
several  authors  have  produced  Monte  Carlo  tables.  For  Cramer  -von 
Mises  statistics,  the  situation  is  different:  asymptotic  distributions  can  be 
found  (see,  e.g..  Darling,  1955,  Durbin,  1973,  Stephens  1976)  and 
percentage  points  for  finite  n  converge  rapidly  to  the  asymptotic  points. 
Also,  for  some  important  distributions  with  shape  parameters,  for 
example,  the  von  Mises  and  Gamma,  the  asymptotic  points  for  Cramer- 
von  Mises  statistics  do  not  depend  strongly  on  the  true  value  of  the 
shape,  and  a  test  using  the  estimated  shape  can  be  used  (Lockhart  and 
Stephens,  1985  a,b).  The  tests  described  above,  for  parameters  known 
or  unknown,  have  been  collected  together  in  Stephens  (1986). 

7.  Further  developments 

We  conclude  this  introduction  by  giving  only  a  brief  summary  of 
some  of  the  more  important  developments  of  Kolmogorov-Smirnov  tests, 
with  references  either  to  basic  introductory  sources  or  to  articles  which 
themselves  survey  the  particular  area  and  give  references. 

Kolmogorov-Smirnov  tests  have  been  developed  for  use  with 
right-or  left-censored  data  (or  both):  these  mostly  use  D,  but  some 
variations  of  Renyi-type,  such  as  taking  the  supremum  of  Fn(x)  -  F(x) 
over  a  restricted  range  of  F(x)  or  of  Fn  (x)  have  also  been  suggested. 
Randomly  censored  data  is  an  important  problem,  for  example,  with 
survival  data:  tests  with  such  data  often  use  the  Kaplan-Meier  estimate 
of  F(x).  Hall  and  Wellner  (1980)  give  a  review  and  show  how 
confidence  bounds  for  the  distribution  can  be  found.  A  recent 
technique  for  censored  data  is  given  by  Guilbaud  (1988). 

The  statistic  V  =  D+  -t-  D'  has  been  proposed  (Kuiper,  1960)  for 
use  with  data  on  a  circle,  because  the  value  of  V,  in  contrast  to  those  of 
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D+,  D'  or  D,  does  not  depend  on  the  choice  of  origin.  Of  course  V  can 
also  be  used  for  data  on  a  line.  Pettitt  and  Stephens  (1977)  produced 
tables  for  D  for  the  uniform  distribution  for  discrete  data,  and 
Niederhausen  (1981a)  for  a  variance-weighted  D,  similar  to  A2.  A  test 
for  symmetry  of  a  distribution  was  proposed  by  Smirnov  (1947)  and  has 
since  been  extended:  Gibbons  (1983)  gives  a  review  of  such  tests. 
Tables  for  some  of  the  above  tests,  and  further  discussion  and 
references,  are  in  Stephens  (1983,  1986).  An  interesting  area  for  future 
work  is  to  provide  tests  for  multivariate  distributions. 

Statistics  closely  related  to  D-^,  D'  and  D  were  proposed  by 

Pyke  (1959).  Suppose  Xj,  i=1 . n  are  the  order  statistics  for  a  sample 

from  the  uniform  distribution;  C+  is  maxj  (xj  -  i/(n-t-1)),  C'  =  maxi  (i/(n+1)  - 
Xj),  and  C  =  max  (C+.C*).  These  arise  naturally  in  examining  the  Poisson 
process,  or  the  periodogram  in  time  series  analysis:  they  are  discussed 
by  Durbin  (1973). 

8.  Power 

In  terms  of  power,  Kolmogorov-Smirnov  tests  tend  to  fall  between 
the  Pearson  X2  and  the  Cramer-von  Mises  tests.  On  the  one  hand,  this 
might  be  expected,  since  X2  loses  information  in  a  test  for  a  continuous 
distribution  by  grouping  the  data  into  cells.  Kac,  Kiefer  and  Wolfowitz 
(1955)  showed  that  if  equi-probable  cells  are  used  for  X2,  and  if 
A  =  supx|Fi(x)  -  F(x)|  where  Fi(x)  is  the  true  distribution  and  F(x)  the 
tested  distribution,  D  requires  n^^^^  observations  compared  with  n 
observations  for  X2  to  attain  the  same  power  for  a  given  A,  for  large  n. 
Thus  in  these  circumstances,  X2  will  have  asymptotic  relative  efficiency 
equal  to  zero  compared  with  0.  Many  Monte  Carlo  studies  have 
confirmed  this  superiority  of  D  over  X2  in  most  situations,  especially 
with  small  samples. 

On  the  other  hand,  Cramer-von  Mises  statistics  might  well  be 
expected  to  be  superior  to  D,  since  they  make  a  comparison  of  Fn(x) 
with  F(x)  all  along  the  range  of  x,  rather  than  looking  for  a  marked 
difference  at  one  point.  If  the  alternative  is  directional,  that  is,  if  Fi(x)  - 
F(x)  is  mostly  positive  or  mostly  negative,  the  one-sided  D+  or  D*  can 
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be  very  powerful.  Of  all  the  different  families  of  goodness-of-fit  statistics, 
Cramer-von  Mises  statistics  provide  overall  powerful  tests.  (Stephens, 
1974:  see  also  Kendall  and  Stuart  (1979)  and  Stephens  (1986)  for  more 
discussion.) 

9.  Concluding  Remarks. 

If  the  remarks  above  on  power  may  appear  to  weaken  the  appeal  of  D 
and  its  related  statistics,  it  should  nonetheless  be  emphasized  that  they 
are  preferable  to  the  much  used  statistic.  They  also  have  the  value 
that  they  can  be  used,  by  simply  adding  a  constant  to  Fn(x),  and 
subtracting  it  from  Fn(x),  to  give  a  confidence  interval  for  F(x)  -  an 
attraction  in  today's  world  where  graphical  display  is  increasingly 
available. 

The  final  assessment  of  the  article  by  Kolmogorov  must  be  based  not 
only  on  the  elegance  and  power  of  the  paper  itself,  but  also  on  the 
pioneering  role  it  has  played  in  the  development  of  statistics  in  the 
succeeding  50  years  and  more.  It  launched  seriously  the  use  of  the 
EOF  Fn(x)  as  an  estimator  of  F(x),  to  be  followed  by  its  use  in  testing  a 
given  F(x);  it  was  the  first  article  to  give  a  statistic  which  would  not 
depend  (when  the  null  hypothesis  was  true)  on  the  distribution  F(x) 
tested;  it  was  also  the  first  to  introduce  a  statistic  whose  asymptotic 
distribution  could  be  found  and  easily  tabulated.  Kolmogorov  also  gave 
the  essential  technique  to  find  the  distribution  for  finite  samples.  More 
than  50  years  later,  interest  in  Kolmogorov's  and  other  EDF  statistics 
continues  unabated.  It  is  fitting,  in  conclusion,  to  note  the  resurgence  of 
Fn(x)  in  the  wide  use  of  the  bootstrap:  this  technique,  making  use  of  the 
power  which  modern  computers  provide,  is  based  on  the  use  of  Fn(x)  to 
estimate  F(x),  just  as  was  proposed  by  Kolmogorov  in  1933. 

This  article  was  written  to  introduce  Kolmogorov's  paper  in  a  forthcoming 
volume  on  the  most  influential  articles  in  Statistics,  to  be  edited  by 
S.  Kotz  and  N.L  Johnson. 
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